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Abstract 


We  consider  the  problem  of  identifying  the  class  of  time  series  model  to  which  a  series 
belongs  based  on  observation  of  part  of  the  series.  Techniques  of  nonparametric  estimation 
have  been  applied  to  this  problem  by  various  authors  using  kernel  estimates  of  the  one-step 
lagged  conditional  mean  and  variance  functions.  We  study  cumulative  versions  of  Tukey 
regressogram  estimators  of  such  functions.  These  are  more  stable  than  estimates  of  the 
mean  and  variance  functions  themselves  and  can  be  used  to  construct  confidence  bands. 
Goodness-of-fit  tests  for  specific  parametric  models  are  also  developed. 


1.  Introduction 


Currently  one  of  the  most  challenging  problems  in  nonlinear  time  series  analysis 
is  to  identify  the  class  of  time  series  model  to  which  a  series  {Xt}  belongs  based  on 
observation  of  part  of  the  series,  {AT<,  t  =  0, 1, . . . ,  n}.  Techniques  of  nonparametric 
estimation  have  been  applied  to  this  problem  by  Robinson  (1983),  who  studied  the 
large  sample  properties  of  kernel  estimators  of  lagged  conditional  means  E(Xt  | Xt- j ) 
and  E(Xt\Xt-jyXt-k)  for  various  j  and  k  values.  Such  estimators  are  useful  for 
detecting  nonlinearities  graphically,  see  Tong  (1990,  p.  12).  This  approach  has  been 
further  developed  by  Auestad  and  Tjpstheim  (1990)  who  focused  on  kernel  estimates 
of  the  one-step  lagged  conditional  mean  and  variance  functions  A(x)  =  E(Xt  \Xt _j  = 
x )  and  y(x)  =  var(AT<|A’t_i  =  a:)  for  the  purpose  of  identifying  common  nonlinear 
models  such  as  threshold  (Tong,  1983)  and  exponential  autoregressive  (Ozaki,  1980). 

In  the  present  paper  we  introduce  an  approach  to  this  problem  based  on  es¬ 
timation  of  cumulative  versions  of  the  conditional  mean  and  variance  functions, 
A(-)  =  f  A (x)dx  and  T(-)  =  fay(x)dx,  where  a  is  an  appropriately  chosen  point 
in  the  state  space.  These  estimators,  denoted  A  and  T,  are  obtained  by  integrating 
Tukey  regressograms  for  A  and  7.  The  reason  for  considering  cumulative  versions  of 
the  conditional  mean  and  variance  is  that  it  is  possible  to  derive  functional  limit  the¬ 
orems,  whereas  available  asymptotic  results  for  kernel  or  regressogram  estimators 
of  A  and  7  are  only  useful  pointwise.  We  advocate  A  and  T  as  natural  ‘signatures’ 
of  a  time-series  in  preference  to  estimates  of  A  and  7. 

We  derive  functional  limit  theorems  for  A  and  T  under  conditions  that  can 
be  readily  checked  when  {ATf}  is  a  Markov  chain.  These  results  can  be  used  to 
construct  confidence  bands,  which  are  more  helpful  than  confidence  intervals  in 
assessing  plots.  This  is  the  chief  benefit  from  estimating  cumulative  conditional 
means  and  variances  rather  than  A  and  7  themselves.  Another  benefit  is  that  A 
and  r  are  relatively  insensitive  to  variations  in  bandwidth  compared  to  the  kernel 
or  regressogram  estimators. 

We  also  consider  the  problem  of  testing  whether  the  regression  function  A  has 
a  specific  parametric  form.  Klimko  and  Nelson  (1978)  developed  consistency  and 
asymptotic  distribution  results  for  the  conditional  least  square  estimator  9  of  9  for 
the  parametric  model  \{x)  —  g(9,x),  where  g  is  a  known  function  and  9  is  an 
unknown  parameter.  We  construct  a  goodness-of-fit  test  for  this  model  based  on 
a  comparison  of  A  and  a  smoothed  version  of  Ja  g(9,x)dx,  denoted  A.  Here  A  is 
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the  natural  estimator  of  A  under  the  parametric  model.  We  derive  a  functional 
limit  theorem  for  the  process  ■v/n( A  —  A).  As  a  particular  application  we  give  a  test 
for  linearity  of  A.  Robinson  (1983)  has  given  a  test  for  linearity  at  finitely  many 
locations;  other  formal  tests  for  linearity  are  parametric — constructed  by  arranging 
the  linear  model  to  be  nested  within  various  larger  parametric  models,  see  Tong 
(1990,  Section  5.2). 

There  are  some  connections  between  the  present  paper  and  cumulative  hazard 
function  estimation  in  survival  analysis,  see  the  survey  articles  of  Andersen  and 
Borgan  (1985)  and  McKeague  and  Utikal  (1990a).  In  fact  A  is  closely  related  to 
an  estimator  introduced  by  McKeague  and  Utikal  (1990b).  Martingale  techniques 
play  an  important  role  here,  as  they  do  survival  analysis. 

Our  asymptotic  distribution  results  for  A  and  f  axe  given  in  Section  2.  The 
goodness-of-fit  test  for  parametric  submodels  is  discussed  in  Section  3.  We  indicate 
how  our  results  can  be  extended  to  lags  of  higher  order  in  Section  4.  The  results  of 
a  simulation  study  and  some  applications  to  real  data  are  presented  in  Section  5. 
Proofs  are  given  in  Section  6. 

2.  Estimation  of  A  and  T 

Assume  that  the  conditional  mean  and  variance  of  Xt  given  Xq  ,  A'i , . . . ,  A'*  _i 
only  depend  on  Xt-\.  This  property  holds,  for  example,  if  {X* }  is  a  Markov  chain. 
In  particular,  an  important  example  is  the  nonlinear  autoregressive  process 

Xt  =  \{Xt-i)  +  <r{Xt-i)et,  (1.1) 

where  {e<}  are  iid  with  zero-mean  and  unit  variance  and  7  =  cr2.  In  this  case 
the  time  series  is  characterized  by  the  triplet  (A, 7,  distribution  of  eo)-  We  are 
primarily  interested  in  A  and  7.  It  is  assumed  throughout  that  {A\}  is  cc  „tionary 
with  a  marginal  density  denoted  /. 

We  restrict  attention  to  estimation  of  A  and  T  on  a  fixed  interval  [a,b].  The 
regressogram  estimators  A  and  7  are  defined  as  follows.  Let  Ji , . . . ,  ldn  be  a  parti¬ 
tion  of  [a,  b ]  made  up  of  intervals  of  equal  length  wn,  the  bins  of  the  regressogram, 
and  denote  lx  =  J;  for  x  dTj.  Set 

n 

X(x)  =  (nwj(x))-1  J]j{X(_i  eIx}Xt, 
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7(x)  =  (nwnf(x))-1'£,HXt- i  €  Ix}(Xt  -  A(x))2, 

t= 1 

where  /  is  the  histogram  estimator  of  /  given  by 

n 

/(r)  =  K)-1^7{I,_1  €1,}, 

(=i 

and  /(•)  is  the  indicator  function.  Regressogram  estimators  were  introduced  by 
Tukey  (1961)  and  have  been  studied  recently  by  Diebolt  (1990). 

Introduce  the  estimators 

A(-)  =  [  A (x)dx  and  f(-)  =  j  7 (x)dx. 

J  a  J  a 

Although  it  is  possible  to  use  the  more  sophisticated  kernel  estimators  to  yield 
better  estimates  of  A  and  7,  there  is  little  to  be  gained  from  using  them  in  A 
and  r,  which  are  less  sensitive  to  variations  in  A  and  7.  We  prefer  the  regressogram 
estimators  due  to  their  computational  simplicity.  In  practice,  care  needs  to  be  taken 
in  choosing  the  interval  [a,  b]  and  the  bins  to  ensure  that  the  regressogram  estimates 
are  not  too  unstable.  For  good  results,  the  binwidths  should  be  of  comparable  size 
(we  have  taken  them  to  be  of  equal  size  merely  to  simplicity  the  notation),  and 
there  should  be  at  least  5  observations  per  bin. 

Ideally,  in  order  to  carry  out  inference  on  A,  using  a  confidence  band  for  A  say, 
we  would  like  to  find  the  limiting  distribution  of  y/n  (A  —  A).  However,  for  technical 
reasons  we  are  only  able  to  obtain  a  satisfactory  weak  convergence  theory  when  A 
is  replaced  by  the  smoothed  version  of  A  given  by  A*(x)  =  Ja  A *(x)dx,  where 

A*(x)  =  J  /*(u)A(u)  du  j  J  f*(u)du 

and  f*  is  the  histogram  estimator  of  /  determined  by  a  finer  partition  of  [a,  6] 
consisting  of  intervals  of  equal  length  u;*. 

We  regard  A*  as  a  ‘surrogate’  for  A,  which  is  reasonable  since  A*  converges 
uniformly  in  probability  to  A.  However  y/n( A*  —  A)  may  not  be  asymptotically 
negligible:  see  the  remark  following  the  proof  of  Theorem  2.1.  If  it  is  (for  example 
if  A  is  piecewise  constant  over  2"i,...,J <fn  for  some  n)  then  A*  is  not  needed  and 
we  can  deal  with  A  directly.  Similar  comments  can  be  made  concerning  f,  with  T* 
defined  in  a  similar  way  to  A*. 
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We  now  proceed  to  state  the  main  results  of  this  section,  giving  the  asymptotic 
distributions  of  A  and  I\  It  is  assumed  throughout  that  A  and  7  are  Lipschitz.  We 
also  need: 

Condition  A 

(Al)  EX$  <  00. 

(A2)  (A’ojA’t)  has  a  bounded  joint  density  /<  for  all  t  >  1,  and  the  marginal 
density  /  is  continuous  and  does  not  vanish  on  [a,  b]. 

(A3)  supl€[a  6]  var[/(x)]  =  o(wn). 

THEOREM  2.1.  Suppose  that  Condition  A  holds,  nwn  — >  00,  nw„  —*  0  and  ~  u>2 
as  n  — *  00.  Then  \/n{ A  —  A*)  converges  in  distribution  in  C[a,b )  to  a  continuous 
Gaussian  martingale  with  mean  zero  and  variance  function 


THEOREM  2.2.  Suppose  that  the  hypotheses  of  Theorem  2.1  hold,  except  that 
nw„  — >  00  and  EX™  <  00.  Then  y/n(T  —  T*)  converges  in  distribution  in  C[a,b ]  to 
a  continuous  Gaussian  martingale  with  mean  zero  and  variance  function  fa  u/fdx. 
where  u(x)  =  var([Xt  —  A(x)]2|Aft_i  =  x)  and  u  is  assumed  to  be  Lipschtiz. 

Checking  Condition  (A3):  A  large  class  of  stationary  Markov  processes  {AT*}  that 
satisfy  Condition  (A3)  is  described  by  Auestad  and  Tjpstheim  (1990),  who  show 
(pp.  680,  681)  that  strong  mixing  with  a  geometric  mixing  rate  implies  var[/(x)]  ~ 
0((nu>n)-1 )  uniformly  over  [a,  6]  provided  that  /  is  bounded  there.  Thus  (A3) 
holds  under  this  mixing  condition  if  nu;2  — ►  00.  In  a  particular  example  it  will  be 
easier  to  check  geometric  ergodicity  (Nummelin,  1984),  which  implies  strong  mixing 
with  a  geometric  mixing  rate.  Geometric  ergodicity  is  in  turn  implied  by  a  readily 
checkable  condition  of  Tweedie  (1983). 

Another  way  of  checking  Condition  (A3),  which  is  not  restricted  to  Markov 
processes,  is  to  verify  a  mixing  condition  of  Castellana  and  Leadbetter  (19S6,  The¬ 
orem  3.3).  They  considered  the  following  dependence  index  sequence 

n 

0n=  sup  \ft{x,y)  -  f(x)f(y)\ 

z.y6[a,t]  ,=  1 
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and  showed  that 

var(/(x))  =  o(^-  \  +  of-^— \ 

\n  J  \nwn  J 

uniformly  in  x.  Hence,  if  fin  =  0(dn )  and  nw„  — »  oo,  then  Condition  (A3)  holds. 
The  moment  condition  (Al)  can  probably  be  weakened,  but  it  makes  the  results 
easier  to  prove. 

We  now  mention  some  possible  applications  of  these  results. 

Confidence  bands:  Condition  (A3)  implies  that  /  is  uniformly  consistent  (see  the 
remark  at  the  beginning  of  Section  6).  Thus,  using  Theorem  2.2,  it  can  be  shown 
that  H(- )  =  Jaf/f  dx  is  a  uniformly  consistent  estimator  of  H.  Then,  by  Theorem 
2.1,  an  asymptotic  100(1  —  a)%  confidence  band  for  A*  is  given  by 

A(x)  ±  +  |hl)  x  e  [<,,(,], 

where  cQ  is  the  upper  a  quantile  of  the  distribution  of  sup,€[01/,2]  |B0(f)|  and  B° 
is  the  Brownian  bridge  process,  see  Andersen  and  Borgen  (1985,  p.  114).  Tables 
for  ca  can  be  found  in  Hall  and  Wellner  (1980).  A  confidence  band  for  T*  can  be 
obtained  in  a  similar  way. 

Testing  simple  hypotheses:  A  test  of  the  simple  hypotheses,  A  =  Ao  and  7  =  7o- 
where  Ao  and  70  are  given,  can  be  made  by  checking  whether  the  above  confidence 
bands  contain  Aq  and  Tq.  A  rather  different  approach  has  been  taken  by  Diebolt 
(1990),  who  developed  a  test  based  on  a  piecewise  constant  version  of 

f(x)^o(x)dx^ . 

Diebolt  obtained  a  functional  limit  theorem  for  this  process,  and  a  similar  one 
designed  to  test  7  =  70,  where  70  is  given  and  A  is  known,  in  the  special  case  of 
model  (1.1). 

Testing  for  a  difference  between  two  regression  functions:  Consider  the  “two-sample 
problem  of  testing  whether  two  independent  time  series  have  identical  regression 
functions  A.  Denote  the  various  functions,  sample  sizes  estimators  etc.  associated 
with  the  two  series  by  using  a  subscript  1  or  2,  as  in  A;,  j  =  1,2.  Let  n  —  ni  +  n2. 
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Then,  if  n.j/n  — >  pj  for  j  =  1,2,  and  the  conditions  of  Theorem  2.1  are  satisfied 
for  the  two  series,  \/n(A i  —  A2)  converges  in  distribution  in  C[a,b]  to  a  continuous 
Gaussian  martingale  with  mean  zero  and  variance  function 


7i(s) 

/i(*) 


dx  +  p2  1 


72  (*) 

/2(z) 


dx, 


provided  that  Ai  =  A2  on  [a,  b]  and  y/n( A*  —  A£)  converges  uniformly  in  probability 
to  zero.  The  latter  condition  holds  if  the  common  A  is  piecewise  constant,  as 
mentioned  earlier.  Confidence  bands  for  AJ  —  A£  are  constructed  as  above.  Some 
plots  of  such  bands  are  given  in  Section  5. 


3.  Goodness-of-fit  tests  for  parametric  models 

In  this  section  we  consider  the  problem  of  testing  whether  A  belongs  to  a 
parametric  family  {g(0,  •)  :  6  €  0}  of  regression  functions.  Here  g  is  a  known 
deterministic  function,  and  0  is  a  closed,  bounded  subset  of  Rp.  Our  test  is  based 
on  a  functional  limit  theorem  for  v/n( A  —  A),  where  A(z)  =  J~  A (x)dx, 

A(x)  =  J  f*(u)g(9,u)du  j  J  f*(u)du 

and  9  is  the  conditional  least  squares  estimator  minimizing  ]C"=1(AG  —  y($,  Xt-i  ))2. 

First  we  state  a  version  of  the  consistency  and  asymptotic  normality  result 
of  Klimko  and  Nelson  (1978)  that  is  adapted  to  our  present  setting,  taking  the 
opportunity  to  simplify  their  approach  a  little.  We  assume  that  {Ab}  is  an  ergodic 
process  and  E{X\  —  g{9,  Ao))2  has  a  unique  minimum  at  a  point  9 0  in  the  interior 
of  0. 

For  a  matrix  Y  and  a  vector  y,  denote  ||y||  =  sup,  •  |i^y|,  ||y||  =  sup,  |y,|, 
and  y®2  =  yyT .  It  is  assumed  that  g(9,x)  is  twice  differentiable  w.r.t.  9  and  the 
corresponding  derivatives  are  denoted  g'  and  g" . 

Condition  B 

(Bl)  There  exists  a  function  J  such  that  \\g"(9,  x)  —  y"(G^)||  <  J{x)6(9  —  (), 
where  J(A'o)  heis  a  finite  second  moment,  and  limQ— 0  9(a)  —  0. 

(B2)  There  exists  a  function  I\  such  that  \\g"{9,  x)||  <  A'(x),  where  K(Xo)  has 
a  finite  fourth  moment. 


6 


(B3)  g(Oo,Xo )  and  7(^0)  have  finite  second  moments,  and  all  the  components 
of  g'(0o,Xo)  have  a  finite  fourth  moment. 

(B4)  The  matrices 

v  =  fWflo.A'of J], 

S  =  B[s'(«o,A„f27(A'„)] 

are  positive  definite. 

Theorem  3.1.  Under  Condition  B,  8X%Q0  and  y/n(8  —  60)— -»N(0,  V-1  SV). 

We  now  state  the  main  result  of  this  section. 

THEOREM  3.2.  Suppose  that  Conditions  A  and  B  hold  and  A(-)  =  g(8o,  •).  If 
nwn  — *  00  and  nw\  — >  0,  then  v/n( A  —  A)  converges  in  distribution  in  C[a,  b]  to 

f  \/7 (x)/f(x)dW(x)  -tp(-)  f  g\80,x)y/j(x)f(x)  dW(x), 

J  a  J  —00 

where 

tp(z)  =  f  g'(60,x)Tdx  V""1, 

J  a 

and  W  is  the  Wiener  process  extended  to  the  whole  real  line. 

A  chi-squared  goodness-of-fit  test  for  the  parametric  model  is  now  easily  con¬ 
structed.  Let  Ji , . . . ,  fJi  be  a  partition  of  [a,  6]  consisting  of  intervals.  Denote  the 
increment  of  y/n  (A  —  A)  over  Jj  by  A  j.  It  can  be  checked  that  A  =  (A  j)  converges 
in  distribution  to  a  Gaussian  random  vector  with  mean  zero  and  covariance  matrix 
having  rl  th  entry 

H(Jr  n  Jl)  +  Jlf  -  rKJrWJt)  ~ 

where  H  is  defined  in  Theorem  2.1  and 

Hi(z)  =  f  g'(0o,x)j(x)dx. 

J  a 

Let  G  be  the  natural  estimate  of  this  covariance  matrix  obtained  by  replacing  the 
unknown  8 ,  /,  and  7  by  their  estimates.  Then  the  Wald  test  statistic  ATG_1A 
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has  a  limiting  x\  distribution  under  the  parametric  model,  where  q  is  the  rank  of 
the  limiting  covariance  matrix  of  A.  A  test  for  a  parametric  model  of  7  can  be 
developed  in  a  similar  way. 

4.  Extension  to  higher  order  lags 

It  is  possible  to  extend  our  results  to  higher  order  lagged  conditional  means,  but 
it  would  be  unreasonable  to  use  more  than  second  order  lags  in  practice  because  of 
the  “curse  of  dimensionality” —  the  data  becomes  sparser  at  an  exponential  rate  as 
the  dimension  increases.  We  briefly  indicate  how  to  handle  the  second  order  lagged 
conditional  mean  A (x,y)  =  =  x,  Xt-2  =  y).  This  mostly  amounts  to 

just  a  reinterpretation  of  our  original  notation. 

Denote  X<  =  (Xt,  Xt-i)  and  assume  that  the  conditional  mean  and  variance  of 
Xt  given  Xo,Xi, . . .  ,Xt-i  are  A(X<_i)  and  7(X<_i)  respectively.  The  regressogram 
estimator  of  A  is 

n 

A(x,  y)  =  (nw2nf(x,  y))-1  ^  I{Xt- 1  €  Txy\ 

t= 2 

where  lxy  =Xx  xls  and 

n 

/(*,y)  =  (rc^n)-1  €  TXy). 

t= 2 

Here  /  is  a  histogram  estimate  of  the  density  of  Xi. 

In  order  to  obtain  the  asymptotic  distribution  of  A  =  fafa  A  dx  dy  we  need  to 
extend  Conditions  (A2)  and  (A3).  In  Condition  (A2),  ft  is  now  the  joint  densitv 
of  Xj  and  X«.  The  rate  in  (A3)  is  now  o(w„).  Castellana  and  Leadbetter’s  (I960) 
dependence  index  sequence  (3n  can  be  extended  in  the  same  fashion.  If  jdn  =  0{d2n) 
and  ?md  — >  00,  then  the  extended  version  of  Condition  (A3)  holds. 

The  functions  /*,  A*  and  A  are  defined  much  as  before,  except  using  a  parti¬ 
tion  of  [a,  b]2  consisting  of  squares  with  sides  of  length  tc*,  and  integrals  over  lxy. 
Let  C[a,6]2  denote  the  space  of  continuous  functions  on  [o,6]2  provided  with  the 
supremum  norm.  Our  earlier  results  now  extend  as  follows. 

THEOREM  4.1.  Suppose  that,  the  extended  version  of  Condition  A  holds.  7 iu’2  — +  00. 
nuhJ2  — >  0.  and  w*  ~  wV*  ■  Then  yfn  (A  —  A*)  converges  in  distribution  in  C[ci.  6]2 
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to  a  two-parameter  Gaussian  martingale  with  zero  mean  and  variance  function 

Ja  fa**/  f  dxdy. 

THEOREM  4.2.  Suppose  that  the  hypotheses  of  Theorem  4.1  and  the  extended 
version  of  Condition  B  hold  and  X  =  g(60 Then  y/n( A  —  A)  converges  in 
distribution  in  C[a,  b}2  to  a  process  which  has  the  same  form  as  the  limiting  process 
in  Theorem  3.2  except  that  the  integrals  are  with  respect  to  the  Wiener  sheet 
extended  to  Ft2. 


5.  Numerical  results  and  examples 

5.1.  Simulation  study:  We  have  carried  out  simulations  using  three  model  examples 
taken  from  Auestad  and  Tjpstheim  (1990): 

Model  1:  linear  autoregressive,  A'*  =  O.SAh-i  +  e<; 

Model  2:  threshold  autoregressive, 


f  — 0.3A(_i  4-  Cf,  if  Xt-i  <  0, 
\  0.8A't_i  et,  if  At-i  >  0; 


ModelS:  exponential  autoregressive,  A't  =  {0.8  —  1.1  exp(— 50A2_j  )}A<_i  +  e(. 

Here  et  is  Gaussian  white  noise  with  mean  zero  and  standard  deviation  0.1. 
Auestad  and  Tjpstheim  (1990)  checked  geometric  ergodicity  and  stationarity  for 
these  examples. 

We  restricted  estimation  of  A  to  the  interval  [—0.3, 0.3].  The  binwidth  was 
taken  as  wn  =  0.05  (same  as  Auestad  and  Tjpstheim,  who  plotted  point  estimates 
of  A  for  these  three  models).  Inspecting  the  plots  of  A  in  Figure  1,  we  find  that 
the  three  models  are  easily  distinguishable,  even  for  sample  size  as  low  as  250.  The 
parabolic  shape  of  the  linear  autoregressive  model,  and  the  ‘squashed’  parabola  of 
the  exponential  autoregressive  are  especially  distinct. 

Figure  2  shows  plots  of  differences  between  the  estimates  of  the  cumulative  re¬ 
gression  functions  in  the  two  sample  problem,  for  various  pairs  of  the  above  models. 
In  the  first  plot  in  each  row,  the  two  series  are  generated  using  the  linear  model  and 
the  zero  function  is  contained  within  the  band,  so  our  test  would  correctly  conclude 
that  the  regression  functions  are  identical.  In  the  other  plots,  the  zero  function  is 
well  outside  the  bands  and  the  test  correctly  concludes  that  the  regression  functions 
are  different. 
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Figure  1.  A  with  95%  confidence  bands;  solid  lines,  A;  dotted  lines,  A;  dashed 
lines,  confidence  bands;  first  row,  n  =  250;  second  row,  n  =  500. 
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Figure  2.  Ai  —  A2  with  95%  confidence  bands;  first  row,  n  —  250;  second  row. 
n  =  500. 
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Table  1.  Observed  Levels  and  Powers  of  Goodness-of-Fit  Test  for  Linear  Autore¬ 
gressive  Model  at  Nominal  Level  of  5%;  binwidth,  wn  =  0.05;  L  =  4. 


Observed  Series 

Sample  Size 

100 

250 

500 

1000 

1500 

2500 

Linear 

0.0974 

0.0604 

0.0518 

0.0496 

0.0426 

0.0496 

Threshold 

0.8674 

0.9938 

0.9994 

1.0000 

1.0000 

1.0000 

Exponential 

0.9996 

1.0000 

1.0000 

1.0000 

NOTE:  The  data  were  generated  using  the  Gaussian  random  number  generator  of 
Marsaglia  and  Tsang  (1984).  The  number  of  samples  in  each  run  was  5000. 

Table  1  gives  observed  levels  and  powers  of  the  chi-squared  goodness-of-fit  test 
for  the  linear  autoregressive  model  Xt  =  6Xt- 1  +  e<,  when  the  time  series  is  gener¬ 
ated  by  each  model.  At  small  sample  sizes  (less  than  250),  the  covariance  matrix 
estimator  G  sometimes  failed  to  be  positive  definite  and  the  chi-squared  statistic 
value  was  negative.  The  percentage  of  negative  chi-squared  statistics  was  2.4%  and 
0.1%  for  sample  sizes  of  100  and  250  with  the  linear  model;  7%  and  3%  with  the 
threshold  model;  3.6%  and  0.94%  with  the  exponential  model.  We  rejected  the 
linear  model  when  the  chi-squared  statistic  was  negative.  This  is  reasonable  since 
G  is  consistent  under  the  null  hypothesis  so  that  a  negative  chi-squared  statistic 
is  evidence  in  favor  of  the  alternative.  The  observed  levels  are  very  close  to  their 
nominal  5%  values  and  the  powers  are  close  to  100%  (except  for  n  =  100)  under 
the  threshold  and  exponential  models. 

5.2.  Canadian  lynx  data:  The  classic  Canadian  lynx  data  set  consists  of  the  annual 
numbers  of  Canadian  lynx  trapped  in  the  Mackenzie  River  district  of  North-west 
Canada  for  the  period  1821-1934.  Various  parametric  time  series  models  have  been 
proposed  to  fit  these  data,  see  Tong  (1990)  for  an  extensive  review.  Moran  (1953) 
fitted  a  second  order  linear  autoregressive  model,  after  first  transforming  by  log10, 
to  obtain 

Xt  =  1.05  +  1.41X(_j  -  0.77JTi_2  +  e, 

where  ~  iid  (0, 0.04591).  However,  many  authors,  including  Bartlett  (1954), 
Hannan  (1960),  Campbell  and  Walker  (1977)  and  Tong  (1977),  have  judged  this 
model  to  be  inadequate  compared  with  some  other  parametric  models. 

We  carried  out  our  goodness-of-fit  test  for  the  second  order  linear  model  (hav- 
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ing  three  parameters)  using  dn  =  5,6, ,  10,  and  4  (2  by  2)  and  9  (3  by  3)  degrees 
of  freedom.  The  bins  were  arranged  to  cover  the  whole  range  of  the  data  and  to 
contain,  as  closely  as  possible,  equal  numbers  of  data  points.  All  our  tests  indicated 
an  extremely  strong  departure  from  the  linear  model. 

5.3.  IBM  stock  price  data:  Consider  the  set  of  IBM  daily  closing  stock  prices  from 
late  1959  to  mid  1960  (period  I)  and  mid  1961  to  early  1962  (period  II)  given  in 
Tong  (1990).  The  daily  relative  change  in  price  appears  to  be  stationary  and  is  used 
in  place  of  the  raw  data.  Tong  (1990)  tested  for  linearity  and  decided  that  period  I 
is  linear  and  period  II  is  nonlinear.  Figure  3  gives  a  plot  of  the  difference  between 
the  estimates  of  the  cumulative  regression  functions  in  the  two  periods,  along  with 
the  95%  confidence  band,  using  dn  —  10.  The  confidence  band  does  not  contain 
the  zero  function,  so  we  conclude  that  the  regression  functions  for  the  two  periods 
differ  significantly  from  one  another.  Our  chi-squared  test  with  dn  =  8, 10  and  12, 
and  degrees  of  freedom  L  —  2  and  4,  gave  the  same  result. 


-0.02  0.0  0.02  0.04 

Figure  3.  Ai  —  A2  with  95%  confidence  band  for  IBM  stock  price  data;  dn  =  10; 
Ai  =  period  I,  A2  =  period  II. 


6.  Proofs 

Recall  that  the  intervals  Xj  partition  [a,  6].  We  write  them  explicitly  as  X j  = 
(xj_i ,  xj],  j  —  1, . . . ,  dn.  In  what  follows  we  need  f  to  be  uniformly  consistent  for 
/  on  [a,  b}.  This  holds  under  Conditions  (A2)  and  (A3)  since 

£(suPxeM]  I/O)  -  f(x)\)2  <  Ej=i  var[/Oj)]  =  dno(wn)  0, 


12 


and,  by  stationary,  /f(x)  =  Ef(x)  =  iunl  JTx  f(u)du  — ►  /(x)  uniformly  on  [a,b]. 
Also  note  that  =  Xt  —  A(A't-i)  is  a  martingale  difference  with  respect  to  the 
natural  filtration  Tt  —  cr(-ATo , . . . ,  A'f). 

PROOF  of  Theorem  2.1.  First  observe  that  /(x)  =  le"1  J2i  f*  du.  Since  A  is 
Lipschitz  and  /  converges  uniformly  in  probability  to  /,  which  is  bounded  away 
from  zero,  we  have 

A(x)  =  [nu>„/(x)u>*]-1  JupI{x,-'e  T*}[A(A<_! )  +  £t]  du 

n 

=  A*(x)  +  Op{w*n)  +  [nto„/(x)]_1  ^  I{Art_i  €  Tx)it- 

1=1 


Hence,  by  re*  ~  wl  and  nw\  — ■>  0, 

Vn(A  -  A')(z)  =  -d—  f  I{X'~I  €  IlK‘  dx  +  op(l) 

J.  f(X) 

=  M(n,  ■)(«)  +  #(*)  +  op(l) 


uniformly  in  z,  where 


[z<*n] 


M(k,  z)  =  -j=  /f  (x,)-1  ]T  7{Xt_i  €  TMu  k  =  1, . . . , 

v  j=i 


n 


t=i 


i?(z 


)=_i_  r 

y/nwn  Ja 


fHx)  ~  f(x ) 


/(x)/t(x) 


^I{Xt_l  £lr)itdx, 


and,  given  a  function  <t>  defined  on  [a,  6],  (f>  is  the  piecewise  linear  approximation  to 

0  that  agrees  with  <f>  at  each  xj.  Here  [■]  denotes  the  integer  part  and  M(k,z )  is 

defined  to  be  zero  when  [zdn\  =  0.  To  complete  the  proof  we  need  to  show  that 

the  remainder  term  R  converges  uniformly  in  probability  to  zero  and  M(n,  •) — »m, 

where  m  denotes  the  Gaussian  martingale  given  in  the  statement  of  the  theorem, 

-  V 

for  Lemma  4.1  of  McKeague  (1988)  then  implies  that  M(n,  •) — >m. 

Now  M(-,  z)  is  an  Tt  martingale  for  each  fixed  z.  We  shall  use  the  martingale 
central  limit  theorem  (see  Theorem  A. 2  of  Aalen  (1977),  for  instance)  to  show 
that  all  finite  dimensional  distributions  of  M(n,-)  converge  to  those  of  m.  The 
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predictable  variation  process  of  M(-,z)  evaluated  at  k  =  n  is  given  by 


,  n 

=  -  Y,  df*;)-2  EI,)i(A',-,) 

j=  1  t=l 

r*  h(x)  +  0(wn)]f(x) 


-L 


/T(*)2 

Next,  we  check  the  Lindeberg  condition  that 


dx  +  op(  1) — >H(z). 


converges  in  probability  to  zero  for  all  e  >  0.  By  the  conditional  Cauchy-Schwarz 
and  Chebyshev  inequalities,  and  since  is  bounded  away  from  zero  on  [a,  6],  the 
conditional  expectation  in  Ln  is  bounded  above  by 

{ JS7(«? | J=i_i )} * { )]-*£(/{ JV, 

Now  (Al),  stationarity  of  {X*}  and  A  Lipschitz  imply  that  sup*  <  oo,  so  again 
using  the  Cauchy-Schwarz  inequality,  (A2),  boundedness  of  /  and  7,  and  nwn  — >  00, 
we  have 


E(Ln)  <  °(z~7=)  €ly)d 

V  j=  1  (  =  1 


0, 


so  the  Lindeberg  condition  holds.  By  the  martingale  central  limit  theorem,  the  one 
dimensional  distributions  of  M(n,  ■)  converge  to  those  of  m.  The  above  argument 
readily  extends  to  all  finite  dimensional  distributions  of  M(n,  •)  using  the  fact  that 
increments  of  M(-,z )  over  disjoint  intervals  in  z  are  orthogonal  martingales. 

The  next  step  is  to  show  that  {M(n,-):n  >  1}  is  tight  in  D[a,b].  By  a  slight 
extension  of  Theorem  15.6  of  Billingsley  (1968),  it  suffices  to  show  that 


E\M(n,  y)  —  M(n,  x)|2  |M(n,  z)  —  M(n,y)\2  <  C(z  —  x)2  -f  o(l) 


for  a<x<y<z<b,  where  C  is  a  generic  positive  constant.  Indeed,  by  the 
Cauchy-Schwarz  inequality  it  suffices  to  show  that 

E\M(n,y)  -  M(n,x) j4  <  C(y  -  xf  +  o(l).  (6.1) 
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Using  Rosenthal’s  inequality  (Hall  and  Heyde,  1980,  p.  23),  the  left  hand  side  of 
(6.1)  is  bounded  by 


CE 


(62 


I{Xt-x  €  2,16V 


where  the  summation  over  j  runs  from  [xdn]  +  1  to  [ydn]-  By  (A2),  the  first  term 
of  (6.2)  is  bounded  by 


°G)EE£(/W->  e x,.}) 

j  t=l  j,k  s^t 

=  oQ^j(y  -  x)  +  0(l)(y  -  x)2  <  C{y  -  x)2  +  o(l), 
and  the  second  term  of  (6.2)  is  bounded  by 


o(^)  E  EWh  6  1,})He&)i  <  o(-j=)(y  -  X)  =  0(1), 

j  t=  1  V  n 

since  nwn  — >  oo.  So  (6.1)  holds. 

It  only  remains  to  show  that  R  converges  uniformly  in  probability  to  0.  Since 
/  is  a  uniformly  consistent  estimator  of  /,  which  is  bounded  away  from  zero  on 
[a,  6],  it  suffices  to  show  that 


-4  Efe)  -  /4)l  |  £*{*<->  €  ij}(, 

4  j=l  t= 1 

By  the  Cauchy- Schwarz  inequality  and  (A3),  the  expectation  of  (6.3)  is  bounded 
by  d 

4= E(varlt(Ii)])i{  itE(HX,-i  €  I,}£,)2}’ 

j=\  t= 1 

=  —7=  o(y/w^)  0(y/nwn)  — *  0, 

vn 

as  required.  □ 


Remark:  To  show  the  uniform  consistency  of  /,  we  used  Condition  (A3),  which  can 
be  readily  checked  under  me2  — +  oo.  However,  under  nu>2  — *  oo  we  are  unable  to 
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show  that  v/n( A*  —  A)  tends  uniformly  in  probability  to  zero  since  A*  —  A  is  at  best 
of  order  0(u>n).  Thus  we  have  not  been  able  to  obtain  an  asymptotic  distribution 
result  for  v/n(A  —  A)  in  general. 


PROOF  OF  Theorem  2.2.  Define  rt  —  $  —  7(Xt_1).  Since  A  and  7  are  Lipschtiz, 
/  uniformly  converges  in  probability  to  /  which  is  bounded  away  from  0, 


n 

7(1)  =  [n»./(i)]‘I  e  It}  [{,  ■ 


t=l 

n 


ZiHXi-1  €  Jx} 


7 


‘(x)  =  [nu>„/(x)]  1  y;  J{Xt.i  €  T*}7(AT(_i)  +  Op(u>*). 


i=l 


Noting  that  \/nw*  ~  (ntr^)1/2  — >  0,  we  have 

€  Jr}r* 


VS(f  -  nw  =  -J—  f 

\fnwn  Ja 


/(*) 


dx 


+  Op(-7)  J'Y'HX,-!  €Z,}£,<fa  +  op(l) 


(6.4) 


uniformly  in  2.  The  second  term  in  (6.4)  is  unformly  bounded  by 


Op(^)E[EW-«7}6 
=  0'>(^57T^)0p("“’")  =  0p(7=f)  =  °p(1)' 

since  nu>2  —>  00.  The  third  term  in  (6.4)  is  unformly  bounded  by 

0p{'%)  E | ERA'<-' e  |  s  oP(-h)  Op(v^)  =  op(i). 


Hence,  >/n(f  —  T*)  has  the  same  form  as  y/n(A  —  A*)  except  that  r<  replaces  £*.  Note 
that  r*  is  a  martingale  difference  and  E(T%\Xt-\  =  x)  =  i'(x).  Also,  the  condition 
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EXq6  <  oo  implies  that  supt  E(rf)  <  oo.  Therefore,  the  result  follows  by  the  proof 
of  Theorem  2.1.  □ 


Proof  of  Theorem  3.1.  Define  Qn{9)  =  ~  and  q{9)  = 

E(X\  —  g(9,X o))2-  Note  that 

!(<?»(«)  -  «.«))  =  2  V  *,[#(«,  A',-,)  -S(c, 

„  "SJ 

+  i  £>(#,  *i-i)  +  -,))[#(#, X,.,)  -  <j(C,Ar,_i)J. 

4  t=l 

By  Condition  (Bl),  we  have  that 

19(9,1)  -  ,(C,  x)|  <  \CI<(x)  +  lim,  i)||l||#  -  Cil- 


Hence,  under  the  moment  conditions  in  (Bl)  and  (B3),  and  the  ergodic  theorem, 

iie„(#)-Cn(oi<cii#-<ii, 

n 

where  C  is  finite  almost  surely.  It  follows  that  {n-:1  $„(•)}  equicontinuous.  Again 
by  the  ergodic  theorem,  n_1(Jn(#)  — ->q(8)  (<  oo),  which  implies  that  {n-1Qn(-)} 
pointwise  bounded  almost  surely.  It  follows  by  the  Arzela-Ascoli  theorem  that  this 
family  of  functions  is  almost  surely  relatively  compact  in  the  space  of  continuous 
functions  on  0.  Thus  n~1Qn(-)  converges  uniformly  to  q(-)  on  0  almost  surely. 
Since  q{9)  has  a  unique  minimum  at  9o  €  0,  and  9  minimizes  Q n(9),  we  conclude 
that  9  is  consistent. 

Next,  Taylor  expanding  Q'n  about  9q,  we  can  write 


V^(9-80)  =  Un/Vn(8*), 


where  JJn  =  uin\ 


1  k  1 

uln)  =  *  =  1 . .  Vn(9)  = —Q'M, 


and  9 *  is  on  the  line  joining  9o  and  8.  Since  U[n'>  is  a  martingale  in  k ,  the  martingale 

T) 

central  limit  theorem  can  be  used  to  show  that  U„ — >N(0,S)  under  the  moment 


17 


p 

conditions  in  (B3).  To  complete  the  proof  we  need  to  show  that  Vn(9*) — » V. 
Routine  algebra  gives  that 

vn(e*)  =  -Tg'(e0,xt-1 )®2 

n 

t=  1 

+  -  £>'(»•, x«-i)®2 -g'(e„,x,^r2] 

n  * — ' 

t= 1 

+  ±  £>(**,  *t-i)  - 
1  <=i 

n  * — ' 

<=i 

+  -  T[Xt  -  g(9o,Xt.1)}{g"(90,Xt-1)-  g'\9*,Xt-1)]. 

n  L — / 
t— 1 

By  (B3)  and  the  ergodic  theorem,  the  first  term  converges  to  V  almost  surely.  Using, 
9*^+9q,  Conditions  (B1)-(B3)  and  the  ergodic  theorem  it  can  be  shown  that  the 
second,  third  and  last  terms  above  converge  almost  surely  to  zero.  A  strong  law  of 
large  numbers,  see  Hall  and  Heyde  (1980,  Theorem  2.19),  (B2)  and  (B3),  and  the 
martingale  difference  property  of  £t,  give  that  the  fourth  term  also  converges  almost 
surely  to  zero.  We  conclude  that  Un(0*)— — ►V.  □ 


Proof  OF  Theorem  3.2.  By  Taylor  expanding  g(-,u )  about  9q  for  each  fixed  u, 


y/n(A  -  A*)(z)  =  (^J  [ wnf(x )]  1  J  f*(u)g'(9*u,u)du  dx^j  y/n(8  -  90), 


where  9*  lies  on  the  line  joining  9q  and  9.  Since  9  is  a  consistent  estimator  of  9o, 
and  g'  is  continuous, 


J  K/(i)]  1  J^f*(u)g'(9*u,u)du 


From  the  proof  of  Theorem  3.1,  y/n(9  —  #o)  =  V  lUn  +  op(l),  so  using  the  proof 
of  Theorem  2.1, 


y/n{k  -  A)(z)  =  M(n,  ■ )(z )  -  rp(z)Un  +  oP(  1) 


18 


uniformly  in  2.  By  a  D[a,b]  x  R  version  of  Lemma  4.1  of  McKeague  (1988),  it 
suffices  to  show  that  ( M(n ,  •),  Un)  converges  in  distribution  to  (m(-),  i/00),  where 


m(z)  =  f  y/y (x)//(x)  dW(x), 

J  a 

Uoo  =  /  g'(0o,x)y/y(x)f(x)  dW(x). 

J  —  OO 


'D  V 

The  proofs  of  Theorems  2.1  and  3.1  give  that  M(n,-) — un  and  Un — >U oo-  It  only 
remains  to  show  that  the  finite  dimensional  distributions  of  (M(n,  •),  Un )  converge  to 
those  of  (m(-),  Uoo).  This  is  done  by  applying  the  martingale  central  limit  theorem 
to  the  vector-valued  martingale  consisting  of  U ^  and  increments  of  M(-,z )  over 
disjoint  intervals  in  2.  In  particular,  note  that 


[*<*„] 


!/<">)„  =  -V  /(*,)-'  V  /{X,.,  e  Xj} 

71  .■ 


7=1 
j  rj 


t=l 


=— /'f 

Ja 


g'(60,x)y(x)  +  0(wn ) 


/t(x) 

/  5'(^o,a:)7(3:)cix  =  Cov(m(2),Z/00) 

J  a 


E«-  Glr}(/x  +  op(l) 


The  Lindeberg  conditions  involving  increments  of  M(-,  z)  have  been  checked  in  the 
proof  of  Theorem  2.1,  and  those  involving  the  p  components  of  U ^  in  the  proof  of 
Theorem  3.1.  □ 


PROOF  of  Theorem  4.1.  Since  A  is  Lipschtiz  and  /  uniformly  converges  in  prob¬ 
ability  to  /,  which  is  bounded  away  from  0,  we  have 

TX 

A(x,y)  =  A*(x, y)  +  0P(w*n )  +  (mx^/(x, y)]-1  <= 

t= 1 

Since  \/ntx*  ~  (nwV2)1^2  — *  0, 

v  n(A  -  A  )(^i ,  ^2)  =  7=  7  /  /  — - - *: - - - - - cfxdy  +  oP(l). 

Ja  Ja  f(x,y) 

The  remainder  of  the  proof  is  almost  identical  to  the  proof  of  Theorem  2.1  except 
that  Xt  replaces  A\,  Ixy  replaces  X*,  replaces  u>n,  double  integral  (summation) 


19 


replaces  single  integral  (summation),  and  <f>  is  the  piecewise  linear  approximation 
to  <f>  determined  by  cells  Txy.  Note  that  Condition  nw £  — *■  oo  is  used  in  checking 
the  Lindeberg  condition,  and  tightness  can  be  checked  by  using  a  two-dimensional 
time  parameter  version  of  Theorem  15.6  of  Billingsley  (1986)  given  in  Bickel  and 
Wichura  (1971).  We  omit  the  details.  □ 

PROOF  OF  Theorem  4.2.  The  proof  is  similar  to  the  proof  of  Theorem  3.2  and  is 
omitted. 
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